Supplementary Material for Nonparametric Budgeted Stochastic Gradient Descent

ثبت نشده
چکیده

1 Notion We introduce some notions used in this supplementary material. For regression task, we define y max = max y |y|. We further denote the set S as S = B 0, y max λ −1/2 if L2 is used and λ ≤1 R D otherwise where B 0, y max λ −1/2 = w ∈ R D : w ≤ y max λ −1/2 and R D specifies the whole feature space. We introduce five types of loss functions that can be used in our proposed algorithm, namely Hinge, Logistic, L2, L1, and ε−insensitive losses. We verify that these loss functions satisfying the necessary condition, that is, l (w; x, y) ≤ A w 1/2 + B for some appropriate positive numbers A, B. Without loss of generality, we assume that feature domain are bounded, i.e., Φ (x) ≤ 1, ∀x ∈ X. ˆ Hinge loss l (w; x, y) = max 0, 1 − yw T Φ (x) l (w; x, y) = −I {yw T Φ(x)≤1} yΦ (x) Therefore, by choosing A = 0, B = 1 we have l (w; x, y) = Φ (x) ≤ 1 = A w 1/2 + B ˆ L2 loss In this case, at the outset we cannot verify that l (w; x, y) ≤ A w 1/2 + B for all w, x, y. However, to support the proposed theory, we only need to check that l (w t ; x, y) ≤ A w t 1/2 + B for all t ≥ 1. We derive as follows l (w; x, y) = 1 2 y − w T Φ (x) 2 l (w; x, y) = w T Φ (x) − y Φ (x) l (w t ; x, y) = |w T t Φ (x) + y| Φ (x) ≤ |w T t Φ (x) | + y max ≤ Φ (x) w t + y max ≤ A w t 1/2 + B where B = y max and A = y 1/2 max λ −1/4 if λ ≤ 1 y 1/2 max (λ − 1) −1/2 otherwise. Here we note that we make use of the fact that w t ≤ y max (λ − 1) −1 if λ > 1 (cf. Thm. 7) and w t ≤ y max λ −1/2 otherwise (cf. Line 13 in Alg. 2 and Line 16 in Alg. …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nonparametric Budgeted Stochastic Gradient Descent

One of the most challenging problems in kernel online learning is to bound the model size. Budgeted kernel online learning addresses this issue by bounding the model size to a predefined budget. However, determining an appropriate value for such predefined budget is arduous. In this paper, we propose the Nonparametric Budgeted Stochastic Gradient Descent that allows the model size to automatica...

متن کامل

Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training

Online algorithms that process one example at a time are advantageous when dealing with very large data or with data streams. Stochastic Gradient Descent (SGD) is such an algorithm and it is an attractive choice for online Support Vector Machine (SVM) training due to its simplicity and effectiveness. When equipped with kernel functions, similarly to other SVM learning algorithms, SGD is suscept...

متن کامل

Supplementary material of the CVPR’17 Viraliency: Pooling Local Virality

We implemented our LENA pooling layer within the Caffe framework and ran all our experiments using a Tesla K40 GPU. All the networks were fine-tuned from the convolutional filters obtained when training these networks for the 1,000 image classification task on the ImageNet dataset. We iterated the stochastic gradient descent algorithm for 10,000 iterations with a momentum of μ = 0.9 and a weigh...

متن کامل

Supplementary Material: Asynchronous Stochastic Gradient Descent with Delay Compensation

where Cij = 1 1+λ ( uiujβ lilj √ α ), C ′ ij = 1 (1+λ)α(lilj) , and the model converges to the optimal model, then the MSE of λG(wt) is smaller than the MSE of G(wt) in approximating Hessian H(wt). Proof: For simplicity, we abbreviate E(Y |x,w∗) as E, Gt as G(wt) and Ht as H(wt). First, we calculate the MSE of Gt, λGt to approximate Ht for each element of Gt. We denote the element in the i-th r...

متن کامل

Early Stopping as Nonparametric Variational Inference

We show that unconverged stochastic gradient descent can be interpreted as a procedure that samples from a nonparametric approximate posterior distribution. This distribution is implicitly defined by the transformation of an initial distribution by a sequence of optimization steps. By tracking the change in entropy over these distributions during optimization, we form a scalable, unbiased estim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016